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METHODS FOR ANALYSIS OF BIOLOGICAL DATASET PROFILES 

Field of the Invention 

[01] The present invention relates to the analysis of cellular pathways pathways, and 

more particularly to methods and algorithms for identifying the pathways in which a 
particular agent acts, allowing the identification of mechanisms of drug action and gene 
function. Interactions between pathways and functional relationships of components within 
pathways can be identified. Software and methods for evaluating correlations between 
biological datasets are provided. 



Background of the Invention 

[02] Knowledge of the biochemical pathways by which cells detect and respond to stimuli 

Is important for the discovery, development, and correct application of pharmaceutical 
products. Cellular physiology involves multiple pathways, which have complex 
relationships. For example, pathways split and join;, there are redundancies in performing 
specific actions; and response to a change in one pathway can modify the activity of 
another pathway. In order to understand how a candidate agent is acting and whether it will 
have the desired effect, the end result, and effect on pathways of Interest is as important as 
knowing the target protein. 

[031 Methods for detemiining the pathways affected by an agent or genotype 

modification in a cell, and for identifying common modes of operation between agents and 
genotype modifications, are described in Intemational Patent application WO01/067103. 
Cells capable of responding to factors, simulating a state of interest are employed. 
Preferably the cells are primary cells in biologically relevant contexts. A sufficient number of 
factors are employed to involve a plurality of pathways and a sufficient number of 
parameters are selected to provide an informative dataset. The data resulting from the 
assays can be processed to provide robust comparisons between different environments 
and agents. 

[04] The application of statistical methods to the analysis of complex datasets can 

provide a means to determine connelations and identities, or the lack thereof. Logistic 
models can be combined vwth discriminant analysis to consider the interactions among the. 
dataset parameters, and to provide statistical models that are effective in detemiining 
identity among datasets. 

[051 There is an ongoing need In the art to generate better and more useful ways for 

statistical analysis of the large volume of biological response data generated by compound 
and genetic screening. Methods providing statistically meaningful models for such 
screening methods provide a means of addressing this issue. 
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Summary OF THE Invention 
[06] The present invention provides methods, software, and systems for evaluating 

biological dataset profiles, where datasets comprising information for multiple cellular 
parameters are compared and identified. In a prefen-ed embodiment of the invention, the 
dataset is a BioMAP® dataset. A typical dataset comprises readouts from multiple cellular 
parameters resulting from exposure of cells to biological factors in the absence or presence 
of a candidate agent, where the agent may be a genetic agent, e.g. expressed coding 
sequence; or a chemical agent, e.gr. drug candidate. Datasets may be control datasets, or 
test datasets, or profile datasets that reflect the parameter changes of known agents. For 
analysis of multiple context-defined systems, the output data from multiple systems may be 
concatenated. 

[071 In one embodiment of the invention, a prediction envelope is generated for a control 

dataset, which prediction envelope provides upper and lower limits for experimental 
variation in parameter values. The prediction envelope(s) may be stored in a computer 
database for retrieval by a user, e.g. in a comparison with a test dataset. 

[08] In another embodiment of the invention, the prediction envelope for a control dataset 

provides the basis for determining whether a test dataset is different from a control or profile 
dataset, with a predefined level of statistical significance. 

[09] In another embodiment of the invention, a database of trusted profile datasets is 

established. To obtain a trusted profile for an agent X, repeats of profiles from N 
experiments are averaged. Repeats of the profile for agent X that have not been averaged 
are classified, and the classification error plotted as a function of the number of profiles 
used to obtain the average. This establishes the number of repeats required to minimize 
the misclassification error. Trusted profiles are generated by averaging a number of 
repeats sufficient to minimize misclassification error. The database of trusted profile is 
typically stored in a computer for retrieval by a user, provides a basis for identification of test 
profiles. 

Brief Description of the Drawings 
[10] Figure 1: Control envelope at 92% prediction envelope obtained without control data 

centering. 

[11] Figure 2: Control envelope at 92% prediction level with control data centering 

[12] Figure 3: Testing BioMAP gene over-expression profiles for significance. Profile 

x1241 is significant at 95% significance level. 
[13] Figure 4: Misclassification error as a function of experimental and well repeats. 

Cleariy, three experimental repeats are sufficient for enror minimization. 
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[14] Figure 5: Searching "trusted" profiles with the profile for Flurbiprofen. Top five 

candidates are different concentrations of the agents: Flurbiprofen, Budenoside. FR1 22047, 
all prostagiadin inhibitors. 

[151 Figure 6: Painwise conrelation coefficient (Pearson) for a set of compounds 

[16] Figure 7: Painwise correlation coefficient (Pearson) for a set of compounds after 

thresholding and clustering using MDS/pivoting. 

[17] Figure 8: Painwise conrelation coefficient for gene over-expression profiles after 

thresholding for significance and clustering (MDS/pivoting). 

[18] Figure 9a: Networi< representation of compound set tested In HuVEC-PBMC system. 

The network is obtained by applying MDS on the correlation matrix of Figure 6, in 2 
dimensions. Figure 9b: Two dimensional network of genes that are members of 4 
pathways. The networi< is obtained by applying 2D MDS to the pairwise correlation 
coefficients of Figure 8. Figure 9c: Three dimensional networi< df genes that are members 
of 4 pathways. The networi< is obtained by applying 3D MDS to the painwise con-elation 
coefficients of Figure 8. 

[19] Figure 10. Response profiles induced In endothelial cells over-expressing selected 

genes and stimulated with pro-inflammatory cytokines. Endothelial cells transduced with 
retroviral vectors expressing the genes TNFRSF1A, MYD88 and RAS* were treated with IL- 
1p, TNF-a, IFN^ or media alone (Control). The relative levels of readout parameters 
(CD31, E-selectin etc.) were mieasured by ELISA. Data presented are log expression ratios 
(see Methods) firom three {TNFRSF1A, FIAS*) or four [MYD88) repeat experiments. The 
black line representing the overall shape of each profile connects the mean values of the 
data points. 

[20] Figure 11. Functional classification of genes in multiple cellular contexts, (a) 

Endothelial cells transduced with retroviral vectors expressing the genes listed to the right 
were treated with IL-1p, TNF-a. IFN^ or media alone (Control). Figure shows relative 
increase (red), decrease (green) or lack of change (black) in the mean log expression ratio 
of each parameter relative to non-transduced cells in two to four experiments, (b) Painwise 
Pearson con-elation analysis of gene-specific profiles using the combined 28 parameter 
profile comprising all seven readouts from each of the four cellular systems (cells+cytokine- 
defined contexts) combined into a single datastring for calculations. Positive con-elation is 
shown In blue and negative conrelation In yellow. The order of genes in the figure was 
automatically detemilned by multidimensional scaling of the Pearson correlation metric (see 
Methods), (c-d) Two-dimensional representations of the functional similarity of gene profiles 
revealed in each IndMdual system (cells in medium alone (c); IL-1p4reated cells (d); TNF- 
a-treated cells (e); and IFN-/-treated cells (f). Pearson correlation analysis was pert'ormed 
as before, using the seven readouts within a given system, and multidimensional scaling 
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was used to represent the extent of similarity of gene activities in the systems indicated. 
Only genes whose responses showed significant similarity to other genes in the indicated 
system are shown. In (g), the relationships revealed by combined systems analysis are 
shown. In this case, the 28 parameter combined systems profiles (encompassing the 7 
readouts from each of the 4 cell systems) was used for con-elation analysis and 2 
dimensional representation. The an-angement of genes in two dimensions was 
automatically detemiined by multidimensional scaling (see Methods), and statistically 
significant con-elatlons are shown by the connecting lines. Genes are color-coded to 
Indicate participation in common pathways (red: NF-kB; blue: RAS/MAPK; green: IFN-y; 
grey PI3K/Akt: and white: novel genes). 

[21] Figure 12. IL-1 activates the RAS/MAPK pathway through MYD88, stimulating a 

MAPK-dependent negative feedback loop modulating endothelial VCAM-1 expression, (a) 
Endothelial cells over-expressing MYD88. RAS*. MEKV or MEK2*\Nere stimulated with IL- 
1P, TNF-a or media alone (None), and VCAM-1 expression was measured by ELISA. MEK 
inhibitor PD098059 (3.7 nM) or DMSO (0.1%) as buffer control were added to cells one 
hour prior to cytokine stimulation. Note that blockade of the RAS/MAPK pathway with 
PD098059 increases VCAM-1 expression when the pathway is activated through RAS*, 
MEK1*. MEK2* or IL-1/MYD88. but not in cells treated with TNF. Error bars indicate 
standard deviation from triplicate samples, (b) Endothelial cells were stimulated with TNF-a 
(lOng/ml), IL-ip (Ing/ml) or a mixture of TNF and IL-1 (lOng/ml TNF + Ing/ml IL-1), and 
VCAM-1 expression was measured by ELISA. Note that IL-1 modulates the VCAM-1 
expression induced by TNF. (c) Endothelial cells were co-transduced with RAS*+empty 
vector, RAS*+IKBKB* or RAS*+RELA. Expression of individual genes in co-transduced 
cells was confinned by quantitative RT-PCR. Cells transduced with f?AS*+empty vector 
were treated with IL-ip to stimulate the NF-kB pathway. In cells transduced with 
RAS*+IKBKB* or RAS*+RELA cells the NF-kB pathway is stimulated by over-expression of 
IKBKB* and RELA themselves. Note that RAS* has no effect on VCAM expression in cells 
expressing IKBKB* or RELA. (d) Schematic diagram of the interactions between the NF-kB 
and IRAS/MAPK pathways in endothelial cells. Genes are color coded according to the 
pathways to which they belong (red: NF-kB; blue: RAS/MAPK). The split coloration of 
MYD88 and IRAKI genes indicates that they participate in both pathways. Red dotted lines 
represent novel pathway interactions revealed by the present study. 

122] Figures ISA and 13B depicts a graphic of a network model, where multiple views 

can be presented in three dimensions, and where each window may have the model 
representing different infomriation. In Figure 13A the color indicates the compound 
identification number, and size indicates the test concentration. In Figure 13B the size 
indicates an effect on VCAM expression. 

4 



wo 2004/094992 PCTAJS2004/012688 

[231 Figure 14 depicts a graphic of a networl< model where the information about the 

compound class is conveyed by the color. 

[241 Figures 15A-15E depict a graphic of a network model, with using neighborhood 

filtering. In 15A the view is shown without neighborhood filtering. 15B-15E show the 
change in view. Initially the far cluster is not visible, looking in the neighborhood of the 
gray-blue cluster, but in approaching the former, the color changes and it is brought into 
view. 



Detailed Description of the Embodiments 

[251 Biological datasets are analyzed to detemriine statistically significant matches 

between datasets, usually between test datasets and control, or profile datasets. 
Comparisons may be made between two or more datasets, where a typical dataset 
comprises readouts from, multiple cellular parameters resulting from exposure of cells to 
biological factors in the absence or presence of a candidate agent, where the agent may be 
a genetic agent, e.g. expressed coding sequence; or a chemical agent, e.g. drug candidate. 

[261 A prediction envelope is generated firom the repeats of the control profiles; which 

prediction envelope provides upper and lower limits for experimental variation in parameter 
values. The prediction envelope(s) may be stored in a computer database for retrieval by a 
user, e.g. In a comparison with a test dataset. 

[27] Using multidimensional scaling methods, relationships between components, e.g. 

proteins in a biological pathway; relationships between pathways: etc., are graphically 
displayed, to aid in the identification of such relationships. 

[28] In one embodiment of the invention, the analysis methods provided herein are used 

in the detemnination of functional homology between two agents. As used herein, the tenn 
"functional homology" refers to detfermination of a similarity of function between two 
candidate agents, e.g. where the agents act on the same target protein, or affect the same 
pathway. Functional homology may also distinguish compounds by the effect on secondary 
pathways, i.e. side effects. In this manner, compounds or genes that are structurally 
dissimilar may be related with respect to their physiological function. Parallel analyses 
allow identification of compounds vwth statistically similar functions across systems tested, 
demonstrating related pathway or molecular targets. Multi-system analysis can also reveal 
similarity of functional responses induced by mechanistically distinct drugs. 

[29] In a preferred embodiment, the datasets of infomnatlon are obtained from biologically 

multiplexed activity profiling (BioMAP®), which methods are described, for example, in U.S. 
Patent no. 6,656,695; in co-pending U.S. provisional patent application 60/465.152, filed 
April 23, 2003; and U.S. patent applications USSN 09/962,744, filed September 13. 2001; 
USSN 10/220.999; and USSN 10/236,558, filed September 5, 2002, herein each specifically 
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incorporated by reference. Briefly, the methods provide screening assays for biologically 
active agents, where the effect of altering the environment of cells in culture is assessed by 
monitoring multiple output parameters. The result is a dataset that can be analyzed for the 
effect of an agent on a signaling pathway, for detemiining the pathways in which an agent 
acts, for grouping agents that act in a common pathway, for identifying interactions between 
pathways, and for ordering components of pathways. 

PC] The data flrom a typical "system", as used herein, provides a single cell type or cell 

types (where there are multiple cells present in a well) in an in vitro culture condition. 
Primary cells are prefemed. to avoid potential artifacts introduced by cell lines. In a system, 
the culture conditions provide a common biologically relevant context. Each system 
comprises a control, e.g. the cells in the absence of the candidate biologically active agent. 
The samples in a system are preferably provided In triplicate, and may comprise one, two. 
three or more triplicate sets. 

pi] As used herein, the biological context refers to the exogenous factors added to the 

culture, which factors stimulate pathways In the cells. Numerous factors are known that 
induce pathways In responsive cells. By using a combination of factors to provoke a cellular 
response, one can investigate multiple individual cellular physiological pathways and 
simulate tiie physiological response to a change in environment. 

P?] A BioMARB) dataset comprises values obtained by measuring parameters or 

mari<ers of the cells In a system. Each dataset will therefore comprise parameter output 
from a defined cell type(s) and biological context, and will Include a system control. As 
described above, e^ch sample, e.g. candidate agent, genetic construct, etc.. will generally 
have triplicate data points; and may be multiple triplicate sets. Datasets from multiple 
systems may be concatenated to enhance sensitivity, as relationships in pathways are 
strongly context-dependent. It is found that concatenating multiple datasets by 
simultaneous analysis of 2. 3, 4 or more systems will provide for enhance sensitivity of the 
analysis. 

P3] By refen-ing to a BioMAP®. or functional profile, It Is Intended that the dataset will 

comprise values of the levels of at least two sets of parameters, preferably at least three 
parameters, more preferably 4 parameters, and may comprise five; six or more parameters. 
Preferably, a small set of about 3 to 6 biologically relevant parameters Is measured. 

P4] In many cases the literature has sufficient Infonnation to establish the system 

conditions to provide a useful functional profile. Where the Infonnation Is not available, by 
using the procedures described In the literature for identifying mari<ers for diseases, using 
subtraction libraries, microanays for RNA transcription comparisons, proteomic or 
Immunologic comparisons, between nomial and cells In the physiologic state of Interest, 
using knock-out and knock-in animal models, using model animals that simulate the 
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physiological state, by introducing ceils or tissue from one species into a different species 
that can accept the foreign cells or tissue, e.g. immunocompromised host, one can 
ascertain the endogenous factors associated with the physiologic state and the markers that 
are produced by the cells associated with the physiologic state. 

[351 The parameters may be optimized by obtaining a system dataset, and using pattern 

recognition algorithms and statistical analyses to compare and contrast different parameter 
sets. Parameters are selected that provide a dataset that discriminates between changes in 
the environment of the cell culture known to have different modes of action, i.e. the biomap 
is similar for agents with a common mode of action, and different for agents with a different 
mode of action. The optimization process allows the Identification and selection of a 
minimal set of parameters, each of which provides a robust readout, and that together 
provide a biomap that enables discrimination of different modes of action of stimuli or 
agents. The iterative process focuses on optimizing the assay combinations and readout 
parameters to maximize efficiency and the number of signaling pathways and/or functionally 
different cell states produced in the assay configurations that can be identified and 
distinguished, while at the same time minimizing the number of parameters or assay 
combinations required for such discrimination. 

[36] Parameters are quantifiable components of cells. A parameter can be any cell 

component or cell product including cell surface determinant, receptor, protein or 
conformational or posttranslational modification thereof, lipid, carisohydrate, organic or 
inorganic molecule, nucleic acid, e.g. mRNA. DMA, ete. or a portion derived from such a cell 
component or combinations thereof. While most parameters will -provide a quantitative 
readout. In some Instances a semi-quantitative or qualitative result will be acceptable. 
Readouts may include a single determined value, or may include mean, median value or 
the variance, etc. 

[37] Mari<ers are selected to serve as parameters based on the following criteria, where 

any parameter need not have all of the criteria: the parameter is modulated In the 
physiological condition that one is simulating with the assay combination; the parameter is 
modulateid by a factor that is available and known to modulate the parameter in vitro 
analogous to the manner it Is modulated in vivo; the parameter has a robust response that 
can be easily detected and differentiated; the parameter Is secreted or is a surface 
membrane protein or other readily measurable component; the parameter desirably 
requires not more than two factors to be produced; the parameter is not co-regulated with 
another parameter, so as to be redundant in the infonnatlon provided; and in some 
instances, changes in the parameter are indicative of toxicity leading to cell death. The set 
of parameters selected is sufficiently large to allow distinction between datasets. while 
sufficienay selective to fulfill computational requirements. 
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[38] Parameters of Interest include detection of cytoplasmic, cell surface or secreted 

biomolecules. frequently- biopolymers, e.g. polypeptides, polysaccharides, polynucleotides 
lipids, etc. Cell surface and secreted molecules are a preferred parameter type as these 
mediate cell communication and cell effector responses and can be more readily assayed 
In one embodiment, parameters Include specific epitopes. Epitopes are frequently identified 
using specific monoclonal antibodies or receptor probes. In some cases the molecular 
entities comprising the epjtope are firom two or more substances and comprise a defined 
structure: examples include combinatorially determined epitopes associated with 
heterodimeric integrins. A parameter may be detection of a specifically modified protein or 
oligosaccharide, e.g. a phosphorylated protein, such as a STAT transcriptional protein- or 
sulfated oligosaccharide, or such as the carbohydrate structure Sialyl Lewis x. a selectin 
ligand. The presence of the active conformation of a receptor may comprise one parameter 
while an Inactive conformation of a receptor may comprise another, e.g. the active and 
inactive forms of heterodimeric Integrin owPz or l\/lac-1 . 
psj Candidate biologically active agents may encompass numerous chemical classes 

pnmanly organic molecules, which may include organometallic molecules, inorganic 
molecules, genetic sequences, ete. An important aspect of the invention is to evaluate 
candidate dmgs. select therapeutic antibodies and protein-based therapeutics with 
prefenred biological response functions. Candidate agents comprise functional groups 
necessary for structural interaction with proteins, partlculariy hydrogen bonding and 
typically Include at least an amine, carbonyl. hydroxyl or carboxyl group, frequently at least 
two of the functional chemical groups. The candidate agents often comprise cyclical carbon 
or heterocyclic structures and/or aromatic or polyaromatic structures substituted with one or 
more of the above functional groups. Candidate agents are also found among 
biomolecules. Including peptides, polynucleotides, saccharides, fatty acids, steroids, 
purines, pyrimldines. derivatives, structural analogs or combinations thereof. 
[40] Included are phamiacologlcally active dmgs. genetic agents, etc. Compounds of 

interest include chemotherapeutic agents, antl-lnflammatory agents, hormones or hormone 
antagonists, ion channel modifiers, and neuroactive agents. Exemplary of pharmaceutical 
agents suitable for this Invention are those described In. "The Phamiacological Basis of 
Therapeutics." Goodman and Gilman. McGraw-Hill. New York, New Yori<. (1996). Ninth 
edition, under the sections: Drugs Acting at Synaptic and Neuroeffector Junctional Sites- 
Drugs Acting on the Central Nervous System; Autacolds: Drug Therapy of Inflammation- 
Water. Salts and Ions; Drugs Affecting Renal Function and Electrolyte Metabolism- 
Cardiovascular Drugs: Drugs Affecting Gastrointestinal Function; Drugs Affecting Uterine 
Motility; Chemotherapy of Parasitic Infections; Chemotherapy of Microbial Diseases- 
Chemotherapy of Neoplastic Diseases; Drugs Used for Immunosuppression; Drugs Acting 
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on Blood-Forming organs; Hormones and Hormone Antagonists; Vitamins. Demiatology 
and Toxicology, all incorporated • herein by reference. Also Included are toxins and 
biological and chemical warfare agents, for example see Somanl. S.M. (Ed.). "Chemical 
Warfare Agents." Academic Press. New York. 1992). 
[41] The temi "genetic agerif refers to polynucleotides and analogs thereof which 

agents are tested In the screening assays of the invention by addition of the genetic agent 
to a cell. Genetic agents may be used as a factor, e.g. where the agent provides for 
expression of a factor. Genetic agents may also be screened. In a manner analogous to 
chemical agents. The introduction of the genetic agent results in an alteration of the total 
genetic composition of the cell. Genetic agents such as DMA can result in an 
experimentally Introduced change in the genome of a cell, generally through the integration 
of the sequence Into a chromosome. Genetic changes can also be transient, where the 
exogenous sequence is not Integrated but Is maintained as an episomal agents. Genetic 
agents, such as antisense oligonucleotides, can also affect the expression of proteins 
wrthout changing the cell's genotype, by interfering with the transcription or translation of 
mRNA. The effect of a genetic agent is to increase or decrease expression of one or more 
gene products In the cell. 

[42] Agents are screened for biological activity by adding the agent to cells in the system- 

and may be added to cells In multiple systems. The change in parameter readout in 
response to the agent is measured to provide the BioMAP® dataset. 

Prediction Envelopes 

[43] In order to identify profiles that show an effect of the test agent (compound gene 

biologic and or combinations) In a system, a statistical test will provide a confidence level for 
a change In the parameters between the test and control profiles to be considered 
significant. A set of methods herein termed "control prediction envelope" are utilized This 
set of methods uses the experlriientally measured control profiles to create upper and lower 
limits for the level of variation of parameters values that one would expect in a subsequent 
expenment. These limits can be established at any level of statistical significance provided 
that enough experimental profiles are available. 

[44] The raw data may be initially analyzed by measuring the values for each parameter 

usually in triplicate or in multiple triplicates. For each agent in a system, the mean value for 
each parameter is calculated; and divided by the mean parameter value from a negative 
control sample to generate a ratio. The ratios are then log« transfomied. The transfomied 
ratios may be averaged from repeat experiments of a system. The dataset thus obtained 
may be referred to as a normalized blomap dataset. 



9 



WO20MA»4992 PCT/US200«1«88 

. KT ^'^"'P'" methodology provides a non-pa^metrlo approach for 

esUW»h.ng slgnfflcance o, an agon. pro„,e. Methods o, generating a p ^0,,^,^ 
envelope may,no.ude a non^,«d -prediction envelope"; centered -p^cHon envelope- 
center*, prediction envelope' based on Hottiings method; and the like 
m For a non-centered -pnedictlon envelope" method, profiles that conespond to the 

contn^l ftom many experiments are collected. These profiles contain a number of 

be the individual measurement from a well, the average of the implicates measui^d in the 
expenment. the median of the replicates, etc. Figure 1 presents a set of such profiles that 
are composed fi^n the values of eight readouts measured in experiments of HUVEC cells 
stlmu,at<K. with IL.1/TNF/,F^.,. Visually, a 1-standard deviation envelope may be created 
around the profile of «,e combined means by connecting the points that con^spond to the 
values of one standard deviation for each of the measured values for the parameters 
£47, These two -envelop," lines are then moved parallel to themselves, by equal 

witt-in them and a user specified number has at least one of the measured parameters 
outside them. The prediction level of the em-elope is specified as the peroentage of control 
cun»s that are completely contained within the -prediction envelope". The method provides 
two .mpcrtanl advantages; a) it does not a pnbrf assume any statistical distribution of the 
^penmental values and b) the method Is able to self-adjust as more experimental data 
become available. 

14.1 To create a centered -prediction envelope- requires the use of two sets of control 

replicates on each plate. These replicates provide a variabllity esUmate for the combination 
of system and readout measurement on the given plate. Each set provides a point estimate 
or the parameter value. THIs point estimate can be obtained as the mean of the replicates 
«,e median, etc. The overall mean of the two point, is calculated and subtracted from the 
two point estimates thus centering the points around zero. Combining the points from all 
parametera of an experiment, one obtains a profile (symmetric lines around zero) 
representing an estimate of the control variablBty for the given experiment. Similar profiles 
from many experiments are used to create a -centered predicHon envelope" using 
methodology identical to the one employed previously. A typical example of the 
construction of a control predicUon envelope is presented In Figure 2. This method can be 
further extended by using more than 2 sets of points per plate, for estimating the control 
variabilis,. In this case, the ihree or more curves that provide the variability estimate will be 
centered by subtracting the overall mean cunre, before adding them to the cunres from 
other experiments for creating the -prediction envelope". 
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[49] An advantage of this approach is that by constructing envelopes based on curves 

that represent variability from a mean control value. the effect- of the absolute OD value bias 
of each experiment is minimized. 

[50] "Centered prediction envelope" based on Hotteling's T2 method: This method is 

again using centered profiles of estimated variability transfomiing them into an equivalent 
single "distance" value. Centered profiles from multiple experiments are collected and the 
covariance. matrix of the set is calculated. Then, fomiing the quadratic fom, of the profile 
vector and the covariance matrix we obtain a single numerical value that represents the 
-distance- of each confrol profile from the "center" of all control profiles. An empirical 
distnbution of these distances, that represent the variability of the control profile across 
many experiments, is obtained. This distribution provides the means of predicting the 
expected variability of the control In a subsequent experiment at a predefined prediction 
level. This methodology has the additive advantage of accounting for the possible 
covariance of the readouts comprising the profile. 

Creation of "trusted profiles" database 
[51] Due to the biological variability, a BloMAP® profile may vary from one experiment to 

another. In order to create a database of reference profiles; profiles are averaged from 
several repeats of an experimental system. The number of repeats that need to be 
averaged in order to obtain a "trusted" profile can be obtained through a classification 
process. 

[52] In one embodiment of the Invention, the classification process for creating a trusted 

profile is as follows. An Initial trusted profile Is obtained by averaging N datasets of biomap 
profiles from N experiments, where the dataset may comprise a normalized biomap dataset 
as described above. The initial trusted profile should include representative samples of the 
functional space that heeds to be covered. 

[53] For each initial trusted profile, the analysis will further include X number of datasets 

which comprise similar experimental data to the initial trusted profile and which utilize the 
same expenmental system, but which have not been included In the averaging process to 
generate the initial frusted profile. The X datasets are classified against the inltlai trusted 
profile using a standard classification method, which may include, without limitation k- 
nearest neighbors, neural networi<s. discriminant analysis, and the like. The classification 
en-or is plotted, e.g. as a function of the number of profiles that are used to obtain the 
average profile; number of well repeats; etc. The number of repeats required for minimizing 
the classification en-or is then established by visual inspection; mathematical criteria; etc A 
trusted profile is then generated using the appropriate number of repeats that are required 
for minimizing classification error. 
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[54] Figure 4 presents such a graph. The error is given as a function of the number of 

profiles used for obtaining the "trusted" profiles (x-axis), as well as the number of wells used 
for each measurement (numbers on the curves). In this example. 3 repeats of the 
experimental profile are required to obtain a minimum in the classification error. 

[55] A feature of the invention is the generation of a database of profiles for a variety of 

agents, which agents may be compounds, genes, etc. Such a database will typically 
comprise tmsted.profiles.as described above, for a number of agents. The agents of 
interest in a database may be selected and an^nged according to various criteria: the types 
of molecules that are tested, e.g. steroids, antibiotics, neurotransmitters, etc.; by the source 
of compounds, e.g. environmental toxins, biologically active extracts from a particular 
animal or cell, etc.; by the efTect of the compound on specific parameter outputs; by 
concentration or potency; and the lil<e. 

[56] The trusted profiles and databases thereof may be provided in a variety of media to 

facilitate their use. "Media" refers to a manufacture that contains the datasets of the present 
invention. The datasets can be recorded on computer readable media, e.g. any medium 
that can be read and accessed directly by a computer. Such media include, but are not 
limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and 
magnetic tape; optical storage media such as CD-ROM; electrical storage media such as 
RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. 
One of skill in the art can readily appreciate how any of the presently l^nown computer 
readable mediums can be used to create a manufacture comprising a recording of the 
present database Information. "Recorded" refers to a process for storing infomiation on 
computer readable medium, using any such methods as known in the art. Any convenient 
data storage structure may be chosen, based on the means used to access the stored 
infomation. A variety of data processor programs and formats can be used for storage. - 
e.g. word processing text file, database fomiat. etc. 
[57] As used herein, "a computer-based system" refers to the hardware means, software 

means, and data storage means used to analyze the information of the present invention. 
The minimum hardware of the computer-based systems of the present invention comprises 
a central processing unit (CPU), input means, output riieans. and data storage means. A 
skilled artisan can readily appreciate that any one of the currently available computer-based 
system are suitable for use in the present invention. The data storage means may 
comprise any manufacture comprising a recording of the present infomiation as described 
above, or a memory access means that can access such a manufacture. 
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IDENTIFYING SIGNIFICANT DIFFERENCES AND SIMILARITIES 

(581 A test agent (gene, compound, biologic and/or combinations) profile is considered to 

be different than the control if at least one of the parameter values of the profile exceeds the 
-prediction envelope" limits that correspond to a predefined level of significance. The test 
for significance depends on the type of "prediction envelope" that is selected. For the non- 
centered -prediction envelope", the test agent profile is compared against the envelope that 
has been calculated at the predefined significance level. 

159, For the centered "prediction envelope" the ratio of the test agent profile to the control 

profile ,s fornied by dividing the corresponding OD values of the agent and the control 
parameters. This operation is equivalent to centering the test agent profile in order to make 
It compatible with the centered envelope created at a predefined significance level (the 

^ nomializatlon and transfomiatlon operations should be identical for consistency) It is 
suggested that the envelope be created using log transfonned values and that the log of the 
ratio of the agent of the control profile be used. An example of such a test is presented in 
Figures. 

m For the third method, the test agent profile is again centered by dividing with the 

corresponding control profile and the quadratic fom, of the centered profile and the 
covanance matrix of the controls Is fomied. The value obtained from this multiplication is 
then compared with the value obtained from the central variance distribution at the required 
significance level. 

[61, Similarity of profiles Is used for establishing functional homology between a new test 

^ agent (compound, biologic, gene and/or combinations) and the profiles in the "trusted 
profiles" database. 

[62, Figure 5 shows a typical example where a searxjh of the trusted profiles with the 

compound Flurbiprofen, provides a number of "hits" that are ordered by the degree of 
similarity (Pearson correlation) with the search profile. This search produces "hits", the top 
five of which are known prostaglandin inhibitors (Flurbiprofen, budenoslde. FR122047) 
However, while the conelation. e.g. Pearson. Euclidean, eto. provides an ordering of the 
potential functionally homologous candidates. It does not provide a way for the user to 
decide which of these similarities are significant and not due to chance. 

[63, To provide significance ordering, the false discovery rate (FDR) may be detemiined 

First, a set of null distributions of dissimilarity values Is generated. In one embodiment the 
values of observed profiles are pemiuted to create a sequence of distributions of correlation 
coefficients obtained out of chance, thereby creating an appropriate set of null distributions 
of correlation coefficients (see Tusher et al. (2001) PNAS 98. 5116-21. herein incorporated 
by reference). The set of null distribution is obtained by : pemiuting the values of each 
profile for all available profiles; calculating the paln«ise correlation coefficients for all profile; 

13 



wo 2004/094992 PCT/US2004/012688 

calculating the probability density function of the con-elation coefficients for this pemiutation- 
and repeating the procedure for N times, where N Is a large number, usually 300. Using the 
N distributions, one calculates an appropriate measure (mean, median, etc.) of the count of 
correlation coefficient values that their values exceed the value (of similarity) that is 
obtained from the distribution of experimentally observed similarity values at given 
significance level. 

[641 The FDR is the ratio of the number of the expected falsely significant correlations 

(estimated fi-om the correlations greater than this selected Pearson correlation in the set of 
randomized data) to the number of con-elations greater than this selected Pearson 
con-elation in the empirical data (significant con-elations). This cut-off correlation value may 
be applied to the comelations between experimental profiles. 

(651 Using the aforementioned distribution, a level of confidence is chosen for 

significance. This is used to detemiine the lowest value of the correlation coefficient that 
exceeds the result that would have obtained by chance. Using this method, one obtains 
thresholds for positive conflation, negative correlation or both. Using this threshold(s) the 
user can filter the observed values of the painwise correlation coefficients and eliminate 
those that do not exceed the threshold(8). Furthemiore. an estimate of the false positive 
rate can be obtained for a given threshold. For each of the individual "random correlation- 
distributions, one can find how many observations fall outside the threshold range. This 
procedure provides a sequence of counts. The mean and the standard deviation of the 
sequence provide the average number of potential false positives and its standard 
deviation. Figures 6 and 7 show the results of applying this method to a set of compound 
profiles. 

[66] Figure 6 presents the pdinwse con-elation matrix between the different compounds 

It is obvious that it is very difficult to identify clusters of compounds that have similar profiles 
as well as which of these correlations are significantly different than the one obtained out of 
chance. Figure 7 presents the same matrix after a threshold of significance of .995 has 
been applied to the data and a clustering algorithm has been applied to them. 

[67] The same method may be applied to a set of gene data, for example as shown in 

Figure 8. These painA^lse correlation matrix were obtained using Pearson correlation 
between profiles that are the results of concatenating BioMAP profiles for four distinct 
systems (IL-1, TNF. IFN. 3C). The conrelations have been thresholded using the similarity 
values that con-espond to a 0.995 significance level. This approach proves to be very 
successful in clustering together those genes that are known to be members of the same 
pathway. The connectivity across pathways is established through only a few nodes, 
similariy to what have been observed experimentally. 



14 



wo 2004/094992 

PCTAJS2004/012688 

Clustering 

m The date may ba subjected to ncn^upenrtsed hierarchical clustering to «veal 

r.Urt.onsh,p. among profltes. For example, tverarohica, clustering may be perfoZd 
whe™ the Reason cone.at.on ^ emp.oyed as the Custertng metric. Clus.e^ngT*J 

Z Th'T ^' -nances the vlsual^«on of 

funrtonal homology similarities and dissimllariBes, Mulfldlmenslonal scaling (MDS) can be 
app.iedin one, two or three dimensions, - »v /canoe 

then .^nJe^d to reflect the result of MDS. .n the comblna«on of multidimensional scaling 
and p„»t,ng to move high conelaUons toward the diagonal: for each row, In the reentered 
p»™«e correction matrix, starting from the .rs, and moving towards the last, is the rank of 
«ie cor™.at»„ ooefltoents between the diagonal element and the last element on the row 
T^e columns (and due to eymmetry the rows) are then ^ordered so that the ranK o, «,e 
conelat^n coefficents is decreasing from the diagonal towards the limit of the matrix 

I?.Trr »' the nodes 

IS estabtehed the results may be ,.sual.y displayed for enhanced Infomration accessibility to 
a user. Ih one embodiment, the resute are displayed as a network 
m However, hierarohlcal clustering w«h a binary comparison method can obscure 

srgnflcant sft„..ari..es between compounds that are on different branches of a tree This 
becomes partlcularty problematic as the number of variables (parametere and systems) 
rnoreases. To alk», oblectlve evaluation of the slgnlflcance of all relationships between 
cor^pound act«es, proflle data from all muWple systems may be concatenated; and the 
mum-system date compared to each other by palrwlse Pearecn conelatlon. The 
relafconshrps implied by these correlations may then be visualized by using 
mutodimensionai scarmg to represent them H two or three d.men8lons 
I71I .n order to accomplish this, mumdlmensional scaling is used on the original profiles 

tranefbm^ng each one of them into a point in 2D or 3D space. The use of MDS for this 
operatton to prefarred because K presenres the re.a«ve distence of the nodes. Distances 
between agents are representative of their similarities and lines are drawn between 
compounds whose prefiles are similar at a level not due to chance 
[721 in addMon to distence v«ua.izatton, the d.sp.ay of InforniaBon may Include other 

claserflcatlon schemes to aid In analysis. Each point, which represents a test agent in the 
comparison matrix, may be arbitrarily assigned features, such as color, size, shape efc 
Where the asslghment pravides infonnatfon about the agent For example, the size of the 
pent may represent the concentration of the agent used In the experimental analysis- or 
may convey the potency. e.g. 1C50, of the agent Colore and shapes may be used In 
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various ways. e.g. to represent classes of compounds, such as steroids lipids 
polypeptides, polynucleotides, and the like; species of origin or gene families for' natur^'l 
compounds and genetic agents; signaling pathways in which the agent Is known to be 
active; and the like. Figures 13 and 14 illustrate the use of features to display infonnation 
173] such additional infonnatlon may also be conveyed by the use of multiple 

visualization windows. In addition to the graphic display of clustering infom^ation the 
windows may contain text annotation of the profile;, different spatial views of the matrix 
different features, selected regions, and the like. Figures 13A and 13B Illustrate two 
windows of the same statistical model. 
174] Three examples of this approach to compound and gene data are given in Figure 

9(a-c). Rgure 9a shows a 2D networic for a compound set tested on a HuVEC-PBMC 
system (see Figure 6). Figures 9(5-0) show the pathway interaction network for the genes 
involved in four patiiways obtained through the application. of the previous method to the 
composite BioMAP profile (4 systems). 
[75] The representation of a profile comparison in 3 dimensional space provides certain 

advantages, primarily in the Improved ability to represent the distance between agents 
Where the distance represents the statistical conelation. The three-dimensional space may 
be displayed in one or more windows. Stereo visualization methods find use e g where 
the user experience (especially depth perception of the networic) is enhanced and better 
understanding of the interactions Is possible. Stereo visualization requires a combination of 
software and hardware that can be readily obtained for today's workstations and 
visualization servers. 

[76] The distances between pdints are proportional to correlation distance, but over a 

large set of points, the solution is not optimized for every distance, and can create areas of 
less accuracy in the representation. To address this Issue, the field of view may be 
restricted to a portion of the complete set. where ttie distances ar^ optimized for those 
points cunenfly visualized. As the field of view Is moved through the 3 dimensional space 
the distance may be recalculated In order to optimize for the new field of view. To provide 
for a smoother visualization impression, the recalculation may be performed In anticipation 
of the vector movement. The field of view may also comprise a filtering function, e.g. to 
convey a fading at the borders of the field; to screen out specific data points; and the like 
The movement through space is shown In Figures ISA to 15E. where the point of view 
focuses on a specific subset of the space. 

[771 The functional homology analysis may be Implemented In haixlware or software or a 

combination of both. In one embodiment of the Invention, a machine-readable storage 
medium Is provided, the medium comprising a data storage material encoded with machine 
readable data which, when using a machine programmed with Instructions for using said 
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data. IS capable of displaying a any of the datasets and data comparisons of this invention 
Such data, may be used for a variety of purposes, such as drug discovery, analysis of 
mteractlons between cellular components, and the like. Preferably, the Invention is 
implemented In computer programs executing on programmable computers, comprising a 
processor, a data storage system (Including volatile and non-volatile memory and/or storage 
elements), at least one input device., and at least one output device. Program code is 
applied to . Input data to perfomi the functions described, above and generate output 
.nfomiatlon. The output Infomiatlon is applied to one or more output devices, in known 
fashion. The computer may be. for example, a personal computer, microcomputer or 
wori<station of conventional design. 
178] Each program Is preferably implemented In a high level procedural or object oriented 

programming language to communicate with a computer system. However, the programs 
can be implemented, in assembly or machine- language, if desired. In any case the 
language may be a compiled or interpreted language. Each such computer program is 
preferably stored on a storage media or device (e.g.. ROM or magnetic diskette) readable 
by a general or special purpose programmable computer, for configuring and operating the 
computer when the storage media or device is read by the computer to perform the 
procedures described herein. The system may also be considered to be Implemented as a 
computer-readable storage medium, configured with a computer program, where the 
storage medium so configured causes a computer to operate in a specific and predefined 
manner to perfonn the functions described herein. 
179] A variety of structural fomiats for the input and output means can be used to input 

and output the Information In the computer-based systems of the present invention One 
format for an output means test datasets possessing varying degrees of similarity to a 
trusted profile. Such presentation provides a skilled artisan with a ranking of similarities and 
Identifies the degree of similarity contained in the test pattern. 

180] It is to be understood that this invention Is not limited to the particular methodology 

protocols, cell lines, and reagents described, as such may vary. It is also to be understood 
that the temiinology used herein is for the purpose of describing particular embodiments 
only, and is not Intended to limit the scope of the present invention, which will be limited 
only by the appended claims. 

181] As used herein the singular fomis "a", "and", and "the" Include plural referents 

unless the context cleariy dictates othen<vise. All technical and scientific temis used herein 
have the same meaning as commonly understood to one of oitlinary skill in the art to which 
this invention belongs unless cleariy Indicated othenwlse. 
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[82, The examples are put forth so as to provide those of ordinanr skill In the art with a 

complete disclosure and description- of how to make and use the subject invention and are 
not intended to limit the scope of what is regarded as the invention. Efforts have been 
made to ensure accuracy with respect to the numbers used (e.g. amounts, temperature 
concentrations, etc.) but some experimental errors and deviations should be allowed for' 
Unless othen^ise indicated, parts are parts by weight, molecular weight is average 
molecular weight, temperature . is In degrees centigrade; and. pressure Is . at or near 
atmospheric. 

183] All publications mentioned herein are Incorporated herein by reference for the 

purpose of describing and disclosing, for example, the compounds and methodologies that 
are described In the publications, which might be used in connection with the presently 
described Invention. The publications discussed above and throughout the text are 
provided solely for their disclosure prior to the filing date of the present application. Nothing 
herein Is to be construed as an admission that the Inventors are not entitled to antedate 
such disclosure by virtue of prior Invention. 

Examples 

P4, Despite extensive efforts to develop improved statistical techniques for predicting 

functional networics from large datasets the transition from whole-cell molecular 
measurements to useful models of cellular responses In higher eukaryotes remains 
daunting. 

1851 The techniques described here take advantage of several features. To reduce 

artifacts Induced by the use of cell lines, primary human cells cultured in biologically 
relevant contexts were used. A small set of biologically relevant parameters were 
measured. And the same cell type was studied In multiple different contexts (culture 
conditions differing in cell activation stimuli), so as to allow functional characterization of a 
wide range of protein activities from a manageably small number of measurements. 

186] To implement this approach, it is Important to assemble a set of measurements and 

cell systems (cells in different defined contexts) broad enough to encompass most or all of 
the signaling pathways relevant to a particular biological process. The responses of these 
systems to genetic or other experimental perturbation Is then registered by changes in the 
selected parameters. Here we show using vascular endothelium In 4 contexts defined by 
stimulation with different pronnflammatory cytokines that BioMAP® analysis can predict 
functional relationships of proteins within pathways, and reveal Interactions between 
different pathways that could not have been deduced from analysis of cells In any single 
context. BioMAP® analyses will be a useful tool for modeling of the signaling networks 
operating In human cells. 
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Methods 

[871 Cytokines, antibodies, and cell culture. Recombinant human IFN-y. TNF-a and IHB 

were from R&D Systems. Murine IgG from Sigma. Mouse anti-human ICAM-1 (clone 
B4H10) from Beckman Coulter and mouse anti-human E-selectin (clone ENA1) from HyCult 
Biotechnology. Unconjugated mouse antibodies against human VCAM-1 (clone 51-10C9) 
CD31 (clone WM^SS). HLA-DR (clone 046-6). MIG (clone B8-77). and MCP-1 (clone 503^ 
F7) were from BD Biosciences. Mouse anti-human IL-8 (clone 6217.111) was from R&D 
Systems. PD098059 was from Calbiochem. EGM-2 medium and required supplements 
were from Clonetlcs. Human umbilical vein endothelial cells (HUVEC) were from Clonetlcs- 
cultured in mfcrotiter plates in EGM-2 medium containing manufacturer's supplements plus 
2/0 heat-inactivated fetal bovine serum. Confluent cell were stimulated with cytokines (1 
ng/ml IHp. 5 ng/ml TNF-a. or 100 ng/ml IFN-y) for 24 hours. PD098059 (3 7 final 
concentration) was added 1 hr before stimulation and was present during the whole 24 hr 
stimulation period. 

188] CsH-bBsed ELISAs. Cell-based ELISAs were carried out as previously described 

Bnefly. microtiter plates containing treated and stimulated HUVEC were blocked and then 
incubated with primary antibodies or isotype control antibodies (0.01-0.5 jtg/ml) for 1hr. 
After washing, plates were then incubated with a peroxidase-conjugated anti-mouse' IgG 
secondary antibody (Promega) for 1 hr. Plates were washed and developed with TMB 
substrate (Clinical Science Products) and the optical density (OD) was read at 460 nm 
(subtracting the background absorbance at 650 nm) on a SpectraMAX 190 plate reader 
(Molecular Devices). 

189] Retroviral gene transduction. Test genes were cloned into a vector derived from the 

MoMLV-based vector pFB(Stratagene) downstream of the MoMLVLTR. A truncated fom, 
of me human nen^e growth factor receptor (NGFR) preceded by an internal ribosomal entry 
site was used as a marker gene. Retroviral vector plasmid DIsiA was transfected into 
AmphoPack-293 cells (Clontech) by a modified calcium phosphate method according to the 
manufacturer's protocol (MBS transfection kit. Stratagene). Cell supematants were 
harvested 48 hours post-transfectlon. filtered to remove cell debris (0.45 ^m) and 
transferred onto exponentially growing HUVEC. DEAE dextran (10 ^g/ml) was added to 
facilitate transduction. After 6-8 hr. the viral supernatant was removed and cells were 
cultured for an additional 40 hours. Gene transfer efficiency was determined by flow 
cytometric analysis using an NGFR-speclfic monoclonal antibody and was typically ^70% 

[90] Statistical anaiysis. The value of each parameter was measured three times per 

expenment. and two to four experiments were earned out for each over-expressed gene 
Within each experiment, the mean value obtained for each parameter was then divided by 
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the mean value from a sample transduced with empty vector to generate a ratio.- All ratios 
were then logm transformed, and the transfonned ratios averaged from repeat experiments, 
and non-parametric analysis was used to compare the profile of these ratios to the envelope 
of control profiles. Those profiles containing ratio values that exceeded the 95% prediction 
level envelope for control profiles were used to calculate pairwise Pearson con-elation 
coefficients (Partek Pro version 5.1). To select statistically significant con-elation 
coefficients, one hundred randomized datasets. were created by permuting the original 
expression data, and the palnwlse conflation coefficients were calculated for each 
randomized set. Correlation limits were then selected so as to exclude all but a defined 
minimal number of congelations firom the randomized data sets. For the four cellular 
environments combined, limits of [-0.5035, 0.546] excluded all but 2.5% of the 'correlations- 
derived from the randomized datasets (In other words, at these limits 2.5% of the 
con-elations observed are potentially false positives). Limits used to filter correlations 
obtained In individual cellular environments to the 2.5% false discovery rate were: IL-lp- 
treated cells [-0.87, 0.88]; TNF-a-treated cells [-0.87, 0.90], IFN-/-treated cells [-0.86. 0.88]; 
and control cells [-0.84, 0.89]. 

Table 1. Genes over-expressed 

Gene description 
TNF-a receptor type I 

Receptor-interacting serine threonine kinase 1 (RIP) 
CD40 ' 

TNF-p (lymphotoxin A) 
TRAIL receptor 2 
TNF-a 

I-kB kinase p (IKKB), constitutiveiy active" 
NF-kB subunit 3 (p65) 
IL-1 receptor-associated kinase 1 
Hypothetical protein MGC3067 
MAP2K1, constitutiveiy active R4F^® 
MAP2K2, constitutiveiy active K71W^^ 
Rafl. constitutiveiy active^^ 
H-Ras, constitutiveiy active VI 2" 
Myeloid differentiation primary response gene 88 
Phosphotyrosyl-protein phosphatase (SH-PTP2), 
dominant negative^^ 
Sm-like protein 1 (CASM) 
IFN^ 

MHC class II transactivator (C2TA) 
Pyrimldinergic receptor P2Y 
TNFR1 -associated death domain protein 
IL-1 1 receptor a 

AKT1 -estrogen receptor fusion, constitutiveiy active 
upon tamoxifen treatment^ 

pi 1 0 subunit of pl3K, constitutiveiy active^^ M93252 



Gene 

TNFRSF1A 
RIPK1 
TNFRSF5 
TNFB 

TNFRSF10B 
TNFA 
IKBKB* 
RELA 
IRAKI 
MGC3067 
MEK1* 
MEK2* 
RAF* 
RAS* 
MYD88 
SHP2* 

LSM1 
IFNG 
MHC2TA 
P2Y6R 
TRADD 
IL11RA 
AKT1* 

PI3K* 



GenBank no. 

BC010140 

NM_003804 

BC012419 

D12614 

BC001281 

NM_000594 

AF031416 

NM_021975 

BC014963 

BC002457 

NM_002755 

L11285 

L00212 

NM_005343 

NM_002468 

L03535 

BC001767 

NM_000619 

NM_000246 

BC000571 

BC004491 

BC003110 

BC000479 
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[911 Significantly antl-correlated profiles were observed as well. The arrangement of 

genes ,n two dimensions was automatically detemiined from the entire set of correlation 
values by a multidimensional scaling method using AT&T GraphNA. software; the 
statistically significant correlations are highlighted by connecting lines In Fig 1 lc-g 
Results 

m Analysis of en<iothelial cells over-Bxpressing signaling proteins. Endothelial cells 

control vascular inflammaUon by regulating leukocyte traffic and express immunomodulatoo. 
cytokines and chemokines. To analyze this range of activrty. we over^xpressed genes 

ZIZT r"*'"" "'"""^ ^'^^^^ the 

RAS/MAPK pathway in cultures of primary endothelial cells and stimulated individual pro- 

inflammaton. pathways (listed in Table 1). Some genes (denoted by an asterisk) were over- 
expressed in a constitutively acUve fom, to maximize their activity. The effects were then 
assessed by measuring the levels of surface proteins known to be regulated by 
inflammation and/or to reflect the functional state of the cells, including VCAM-1 ICAM-1 
and E-selectin (vascular adhesion molecules for leukocytes). HLA-DR (MHC class II- the 
protein responsible for antigen presentation). MIG/CXCL9 and IL-8/CXCL8 (chemokines 
that mediate selective leukocyte recruitment from the blood), and PECAM-1/CD31 (a 
protein controlling leukocyte transmigration;. 
[93] Genes to be over-expressed were introduced into endothelial cells by retroviral 

transduction. After waiting 48 hours to ensure that the encoded proteins were expressed 
the cells were incubated for a further 24 hours in the presence of pro-inflammatory 
cytokines (IL-ip. TNF-a. or IFN^) or medium alone, and levels of readout proteins were 
measured by ELISA. Figures 10 arid 11a show that the levels of readout proteins were a 
function of the gene being over-expressed and of the cell context (presence of pro- 
inflammatoiy cytokines). For example. TNFRSF1A (the gene encoding TNF receptor I) 
eliated strong responses in IFN-y-treated and control cells, whereas RAS* (encoding a 
constitutively active fom, of RAS) was most active in the context of IL-ip- and TNF-a- 
treatment (Fig. 10). Figure 11a summarizes the effect of each gene on the level of each 
readout protein in the four different cell systems (cells+contexts) employed 
m Analysis of gene function by conrelating responses. We next asked if the readout 

profiles could be used to Identify functional relationships between the over-expressed 
genes. We initially perfomied paln^ise comparisons of the readout profiles induced by all 
over-expressed genes in each individual cell system, measuring the similarity between 
profiles using Pearson correlation coefficients (r). The relationships implied by these 
correlations were visualized by using multidimensional scaling to represent them in two 
dimensions (Fig. llo-f). drawing lines between pairs of genes whose profiles were 
significantly comelated. 
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[951 Strikingly, the readout profiles of genes with closely related functions were Indeed 

strongly correlated, but the strength of the correlation was highly dependent on the cell- 
context. For example, the profiles produced by MEKI* and MEK2* were strongly correlated 
in IL-1P- and TNF^-treated cells (r=0.95 and 0.98. respectively), but the correlation 
between the two did not survive significance filtering in IFN^-treated or control cells (r=0 69 
and 0.68. respectively). Similarly, the profiles produced by TNFA and TNFB were highly 
correlated in control cells (r=0.98). but the correlations in IFN^^ IL-1|3- and TNF-a-treated 
cells were not statistically significant (r=0.77, 0.68 and 0.74 respectively). 

£96] context-dependent congelations were also seen between members of the same 

signaling pathway. For example, genes encoding members of the NF-kB pathway 
(including TNF-a. TNF-p. their receptor TNFRSF1A and the intracellular signaling 
molecules RIPK1. IKBKB*. and RELA) all produced correlated profiles In control cells and to 
lesser extent in IFN-r-treated cells, but not in cells treated with IHp or TNF-a. By contrast 
genes encoding members of the RAS/MAPK pathway (including RAS*. RAR. MEK1*. and 
MEK2*) produced correlated profiles in IL-ip- and TNF-a-treated cells, but not in cells 
treated with IFN-/ or control cells. Thus, only some of the possible functional relationships 
can be mapped in any one cellular context. Conversely, some genes whose products are 
known to belong to the same signaling pathway (such as fRAK1 and i\/IYD88, which both 
encode key components of the IL-1 signaling pathway, or IFNG, which induces the 
transcription of MHC2TA) did not produce significantly correlated responses In any of the 
individual cell systems tested. 

[97] Enhanced resolution of biological activity in conflations of combined profiles 

Because the functional relationships observed depended so strongly on the cellular context 
we hypothesized that an analysis that simultaneously encompasses the data from multiple 
context-defined systems should increase the sensitivity of our approach. We therefore 
concatenated the gene-induced readout profiles from the four cellular systems, yielding for 
each gene a combined profile comprising 28 normalized parameter readouts (the 7 
measured parameters from each of the four systems: ho cytokine. IL1. TNF. and IFN^ 
treated endothelial cells). (As examples, the 28 parameter readouts illustrated Inside the- 
rectangles in Fig 10 comprise the multi-system profiles for the TNF receptor. MYD88 or 
RAS*.) We perfomied painwise comparisons of these 28-parameter profiles, measuring the 
similarity between profiles using Pearson correlations (summarized in Fig. 11b) and 
representing the implied relationships in two dimensions as before (Fig. 11g). 

Virtually all the relationships obsen/ed in individual systems were still apparent but 
many new relationships could also be detected, including those between IFIAK1 and 
MYD88 and between IFNG and MHC2TA. The only relationships that were no longer 
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e^ent were those previously detected between AKT1 and (.SMt In cells treated with IFN 
and hejveen and or .SM„nccnt.l cells. Meet, .;<r, .7^1^ 
■n^uced ve^ different responses In other cellular contexts. Indicating their d Jn« b,^^^ 
^n^ons: the responses to AKTI and we. .enerall. .lated to those Indt-^r 
PI3K and n,en,he. o, the pa«,way. ^pecVely, whe„as Induo^' 

correlated o those produced by any ether genes tested. Combining data obtained In 
"'"«"";"^^~- '-proved the spec.c«y as well as the sensHlv; 
m wove, ,„te«cto„. tot^en signaling pathways. One beneffl of the greater detail 

revealed ^ .uKl^ysten, BloMAP analysis was a .uch dearer separation LTg^ 

fI^T ^"""'"^ "3"^'^ interconnected duster, in 

related to genes encoding members of both the NF-kB and RASmPK pathwa J 
suggesting that MYD88 and IRAKI can interad with both of these pathways 
.100, TO expiore thfe obsen,a«on further, we ™ined the .esponse'to moBa and 

^l^Z """"^ "^"^ *^ -^S'^PK and NF^ pathways (RAS- 
and mFRSF A. respedively) in all four cell systems. As shown In Figure 10 over- 
«pr.^,on Of and mFRSMA Increased E^ledln. ,CAI«.,. iJand VCa7i 

H consistent with the .nown abiit J 

trr"" '' andE-se^n. Over^,«sion ^ 
IT r^r'" '° ^""^ '^^'"^^'^ P^^V under these oondiUons 
Blodong the i^S/MAPK pathway by treatment with the i«EK inhibitor PD098059 r^Z 

17.11 12a,, oonfln^ing thatthe ef^ZI 

by both genes were mediated by the RAS/^MPK pathway. MVD88 (and IRAKI) ar» known 
.0 be ^o.ed in ,L-1-lnduced but not in TNF-lnduced signaling, Jpoo^':^ 

ZITZ T:V'^'°" ™'-""*~''^ ^ ^^>- ""-^ 

treating TNF-a-t^ated ceils with low doses of iL-lp did reduce the teve. of VCAM-I 

expression (Fig. 12b,, as predided from the efied of mDB8 in IL-lp^eated cells The 
.n iblto., effed of I^S' could be overcome by over^p^ssing REM llKBKB^ 
.nd^^,ng «,at the Interadlon between the two pathways occurs upstream of IKBKB lase 
A schematic summary Is presented In Rg. ,2d. Multi-system analysis can thus detect novel 
fundronal .nteneiationships between different signaling pathways. 
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1101, Nave, pathway participants an, mechanisms. BloMAP analysis is also capable of 
.dentrfying novel participants in signaling pathways and defining their network interactions 
For example, the Intracellular phosphatase SHP2 is known to have a role in growth factor- 
rnduced signaling. In our experiments, however. SHP2* showed clear functional similarity 
to members of the NF-kB pathway (Fig iig). reflecting for example a similar up-regulation 
Of ICAM-I and VCAM-I in control cells, and down-regulation of HLA-DR in IFN^-treated 
cells), and demonstrating that this protein can regulate NF-kB signaling in endothelial cells 
In fibroblasts. SHP2 has indeed been shown to interact physically with the NF-kB complex 
and is required for the NF-KB-dependent production of IL-6. Similarly, our studies reveal 
similarity of function of the hypothetical protein MGC3067 to IRAKI. MEK1 and MEK 
leading to the testable hypothesis that it plays a role in the RAS/MAPK pathway 
1102, Multi-system BioMAP analysis also revealed previously unidentified effects of known 
genes. Tl^DD, ,L11RA and P2Y6R, for example, all induced unique profiles that were not 
signrficantly related to any known pathway. P2Y6R Is a G-protein coupled receptor which 
binds uridine diphosphate (UDP). The precise relationship between this activity and the 
vascular responses to inflammation remain to be detemiined. but it is intriguing that P2Y6R 
also plays a role in monocyte responses to cytokine stimulation. 
C103, The BioMAP® technique we describe represents a simplification of existing 
approaches to systems biology. A very wide range of biological behavior can be examined 
by over-expressing signaling proteins in primary cells and evaluating the cells' responses in 
a range of biologically relevant environments. Surprisingly, only a small number of 
measurements from each perturbed cell state is required to reveal a great deal of 
information about the function of the perturbing gene product. Using this approach with 
endothelial cells in several contexts in which inflammatory signaling pathways are activated 
we have rapidly reconstructed key pathway relationships of gene products, con-ectly 
Identifying genes Involved in several known inflammatory signaling pathways, and also 
revealing novel mediators of pathway interactions not previously known in endothelial cells 
In addition, we have identified genes with unique activities in endothelial responses (e.g. 
P2Y6R. IL11RA) and others with activities similar to members of the NFkB or PAS 
pathways (SHP2 and MGC3067. respectively) leading to testable hypotheses about their 
pathway Interactions. Thus BioMAP® analysis is useful for discovery and characterization 
of pathways and pathway Interactions, and for defining key nodal and regulatory points In 
cell signaling networks. 

[104, The BIOMAP® approach also allows analysis of signaling networks in other 
endothelial processes (e.g.. angiogenesis) and in other cells types as well. Application to a 
given biology can utilize the empirical selection of systems (cell types and contexts) and 
parameters that provide a sufficient sensitivity and diversity of responses to perturi^ations of 
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the physiologic processes being studied. In practice, these may be selected iteratively by 
evaluating different test sets of cell contexts and parameters for their ability to detect and 
discriminate benchmarking agents (e.g.. select genes or functional proteins representing 
diverse relevant pathways). . In the endothelial system we used here, the readout 
parameters were chosen to detect and discriminate signaling driven by three key cytokine 
drivers of the inflammatory process. IL-ip. TNF-a and IFN^ that were also used to define 
three of the cell contexts studied. Nevertheless.- this set of parameters also revealed the 
activity of other known signaling pathways (for example the RAS/MAPK and PI3K/Akt 
pathways) as well as that of newly Identified pathways (such as signaling through the UDP 
receptor P2Y6R or the IL1 1 receptor). 
[1051 This broad sensitivity may be an innate property of complex cellular systems in 
which the level and state of each protein are actually an indirect reflection of the interactions 
between tens or hundreds of proteins. -If We assume that we can experimentally identify 
both an appropriate set of readout parameters and a sufficient number of distinct contexts to 
capture the responses Induced by over^xpressing each gene, as few as 10 independent 
parameters would be sufficient to generate unique profiles for all human g^nes. (Assuming 
that there are 40.000 genes and that a readout parameter can have 3 states-up down or 
unchanged-allows 3^»=59.049 profiles.) In practice, the breadth of pathway coverage and 
functional discrimination will depend on the cellular contexts and readout parameters 
selected. 

[106] These data cleariy show that parallel interrogation of cells In multiple contexts allows 
classification of gene function using only a small set of readout parameters. From a 
theoretical perspective. It is clear that each gene product, and the network in which it 
participates, has evolved not to cany out a function In one particular cell context or 
environment, rather it has evolved to provide appropriate Integration of inputs and outputs 
firom any context the cell may encounter. Thus, the physiologic function of a gene product 
can only be defined by Its effects within multiple cell contexts. The ability of BioMAP® 
analyses to efficiently classify gene function using only a few readouts shows that multi- 
system analyses contribute enomiously to the biological Infbmiatlon content Indeed, multi- 
system analyses may be essential for modeling signaling networt<s from measurements of 
cell states no nriatter how many parameters are used. 

[1071 In this study we used specific proteins as readouts, both because these proteins are 
directly relevant to the biology of vascular inflammation and because their levels can readily 
be measured in high-throughput assays, but other readouts such as transcript levels could 
certainly be used. SImilariy. although the present example uses gene over-expression to 
perturb selected pathways, one may cany out a complementary analysis in which gene 
activity is suppressed using sIRNA; or In which chemical compounds are assessed. Indeed. 
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compound profiling using ttie BioMAP technique has recently been shown to be a powerful 
tool for characterizing potential drug candidates. 

[108] One of the findings in this study is the inhibition of the NF-kB pathway by IL1, 
MYD88. RAS and MEK in primary endothelial cells (Fig. 12d), suggesting that the 
RAS/I\MPK pathway may help to prevent over-stimulation of the NF-kB pathway and 
expression of adhesion molecules in endothelial ceils, so moderating immune responses 
and leukocyte recruitment By contrast, RAS has been shown to activate the NF-kB 
pathway in transfomned fibroblast and epithelial cell lines, suggesting that the same 
signaling molecule may have different biological roles in different cell types (or in 
transformed as opposed to primary cells). 

[109] The BioMAP® technique provides an independent system for classifying gene or 
compound function. It is well-suited to large-throughput analyses, and as such will allow a 
•discovery science' approach to defining signaling networks in human cells. By providing 
critical insights Into functional relationships and networks, BioMAP® analyses will 
accelerate the systematic reconstruction of signaling pathways in mammalian cells. The 
present invention, having been described in detail and illustrated by example above, will be 
understood by those of skill in the art, in light of the patent applications, patents, and 
scientific journal reference cited herein, all of which are incorporated herein by reference, to 
be embodied by the claims that follow. 
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